Max Pellert (https://mpellert.at)
Deep Learning for the Social Sciences
From now on, please find all class materials in this repo: https://github.com/DLSS-24/DLSS-24
| Date | Topic | Who? |
|---|---|---|
| 9.4. | Logistics & Motivation | Max |
| 16.4. | Supervised Learning | Max |
| 23.4. | Shallow Neural Nets | Max |
| 30.4. | Perceptron and Multi Layer Perceptrons | Giordano |
| 7.5. | Convolutional Neural Networks | Giordano |
| 14.5. | Graph Neural Networks | Giordano |
| 21.5. | NN for Time Series analysis | Giordano |
| Date | Topic | Who? |
|---|---|---|
| 28.5. | No class | |
| 4.6. | Generative Deep Learning 1 | Giordano |
| 11.6. | NLP 1 | Max |
| 18.6. | NLP 2 | Max |
| 25.6. | Reinforcement Learning | Giordano |
| 2.7. | Project presentation session | Max |
| 9.7. | Large Language Models | Giordano |
| 16.7. | Generative Deep Learning 2 | Max |
Univariate regression problem (one output, real value)
Supervised learning model = mapping from one or more inputs to one or more outputs
Computing the inputs from the outputs = inference
Example:
Input is age and mileage of secondhand Toyota Prius
Output is estimated price of car
Model is a mathematical equation, but better and more generally: model is a family of equations
Model includes parameters
Parameters affect outcome of equations
Training a model = finding parameters that predict outputs “well” from inputs for a training dataset of input/output pairs
Check Appendix A of “Understanding Deep Learning”
| Input | Variables are always indicated with Roman letters Normal = scalar Bold = vector CAPITAL BOLD = matrix |
|
| Output |
| Model | Functions are always indicated with square brackets Normal = returns scalar Bold = returns vector CAPITAL BOLD = returns matrix |
| Input | Structured or tabular data | |
| Output | ||
| Model | ||
| Parameters | Parameters are always Greek letters | |
| Model |
We use a training dataset of I pairs of input/output examples:
Loss function or cost function measures how bad the model is at relating input to output for the examples:
Or short:
Loss function: returns a scalar that is smaller when model maps inputs to outputs better
During training, we try to find the parameters that minimize the loss:
To test the model, we evaluate it on a separate test dataset of input/output pairs
Crucially, it must not have seen that data during training (suspiciously high, almost perfect performance on the test set is an indicator that there may have been a spillover)
Testing allows us to see how well it generalizes to “new data”
Still, the test data is usually from the same domain and collected in the same way as the training data, so external validity can be low although test set performance is high
Always critically assess and try to assess performance “in the wild” to establish the model’s limits
This is clearly where your ways to think from the social sciences can come in very handy!
Model
Parameters
Loss function:
Least squares loss function
Loss function:
Least squares loss function
Loss function:
Least squares loss function
Loss function:
Least squares loss function
But you can fit the line model in closed form!
True – but only because we we look at very simple cases so far, we won’t be able to do this for more complex models
But we could exhaustively try every slope and intercept combo!
True – but we won’t be able to do this when there are a million parameters
We test with different set of paired input/output data to measure performance
Degree to which we get the same performance as in training = generalization
Might not generalize well because the model is too simple
Or the Model is too complex
It fits to statistical peculiarities of the specific training data we used, not some “general characteristics”
This is known as overfitting
Shallow neural networks (a more flexible model)
Deep neural networks (an even more flexible model)
Loss functions (where did least squares come from?)
How to train neural networks (gradient descent and variants)
How to measure performance of neural networks (generalization)
Still for today: a practical outlook on Word2Vec
A shallow neural net, which we will cover next time
Surprising relationships could be found in vector space by computing similarities between word vectors
“[…] simple algebraic operations are performed on the word vectors, [and] it was shown for example that vector(”King”) - vector(“Man”) + vector(“Woman”) results in a vector that is closest to the vector representation of the word “Queen”.”
Although the apporach is dated by now: it was used in a large number of different (published) studies in the social sciences over the last years (sometimes also even today)